Improving Resource Utilization in Heterogeneous CPU-GPU Systems

نویسندگان

  • Michael Boyer
  • Kevin Skadron
چکیده

Graphics processing units (GPUs) have attracted enormous interest over the past decade due to substantial increases in both performance and programmability. Programmers can potentially leverage GPUs for substantial performance gains, but at the cost of significant software engineering effort. In practice, most GPU applications do not effectively utilize all of the available resources in a system: they either fail to use use a resource at all or use a resource to less than its full potential. This underutilization can hurt both performance and energy efficiency. In this dissertation, we address the underutilization of resources in heterogeneous CPU-GPU systems in three different contexts. First, we address the underutilization of a single GPU by reducing CPU-GPU interaction to improve performance. We use as a case study a computationally-intensive video-tracking application from systems biology. Because of the high cost of CPU-GPU coordination, our initial, straightforward attempts to accelerate this application failed to effectively utilize the GPU. By leveraging some non-obvious optimization strategies, we significantly decreased the amount of CPU-GPU interaction and improved the performance of the GPU implementation by 26x relative to the best CPU implementation. Based on the lessons we learned, we present general guidelines for optimizing GPU applications as well as recommendations for system-level changes that would simplify the development of high-performance GPU applications. Next, we address underutilization at the system level by using load balancing to improve performance. We propose a dynamic scheduling algorithm that automatically and efficiently

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ultra-Fast Image Reconstruction of Tomosynthesis Mammography Using GPU

Digital Breast Tomosynthesis (DBT) is a technology that creates three dimensional (3D) images of breast tissue. Tomosynthesis mammography detects lesions that are not detectable with other imaging systems. If image reconstruction time is in the order of seconds, we can use Tomosynthesis systems to perform Tomosynthesis-guided Interventional procedures. This research has been designed to study u...

متن کامل

A Study of Scheduling a Neuro - imaging Application On a Heterogeneous CPU - GPU Cluster by Reza Nakhjavani

A Study of Scheduling a Neuro-imaging Application On a Heterogeneous CPU-GPU Cluster Reza Nakhjavani Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2014 The ever increasing complexity of scientific applications has led to utilization of new HPC paradigms such as Graphical Processing Units (GPUs). However, modifying applications to run ...

متن کامل

Integer programming based heterogeneous CPU-GPU cluster schedulers for SLURM resource manager

We present two integer programming based heterogeneous CPU-GPU cluster schedulers, called IPSCHED and AUCSCHED, for the widely used SLURM resource manager. Our scheduler algorithms take windows of jobs and solve allocation problems in which free CPU cores and GPU cards are allocated collectively to jobs so as to maximize some objective functions. Our AUCSCHED scheduler employs an auction based ...

متن کامل

A hybrid computing method of SpMV on CPU-GPU heterogeneous computing systems

Sparsematrix–vectormultiplication (SpMV) is an important issue in scientific computing and engineering applications. The performance of SpMV can be improved using parallel computing. The implementation and optimization of SpMV on GPU are research hotspots. Due to some irregularities of sparse matrices, the use of a single compression format is not satisfactory. The hybrid storage format can exp...

متن کامل

Parallel Processing of Multimedia Data in a Heterogeneous Computing Environment

Recently, many multimedia applications can be parallelized by using multicore platforms such as CPU and GPU. In this paper, we propose a parallel processing approach for a multimedia application by using both CPU and GPU. Instead of distributing the parallelizable workload to either CPU or GPU(i.e., homogeneous computing), we distribute the workload simultaneously into both CPU and GPU(i.e., he...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013